\subsection{Variance}
Let \(X\) be a random variable, and \(r \in \mathbb N\).
If it is well-defined, we call \(\expect{X^r}\) the \(r\)th moment of \(X\).
We define the variance of \(X\) by
\[
	\Var{X} = \expect{(X - \expect{X})^2}
\]
If the variance is small, \(X\) is highly concentrated around \(\expect{X}\).
If the variance is large, \(X\) has a wide distribution including values not necessarily near \(\expect{X}\).
We call \(\sqrt{\Var{X}}\) the standard deviation of \(X\), denoted with \(\sigma\).
The variance has the following basic properties:
\begin{itemize}
	\item \(\Var{X} \geq 0\), and if \(\Var{X} = 0\), \(\prob{X = \expect{X}} = 1\).
	\item If \(c \in \mathbb R\), then \(\Var{cX} = c^2\Var{X}\), and \(\Var{X + c} = \Var{X}\).
	\item \(\Var{X} = \expect{X^2} - \expect{X}^2\).
	      This follows since
		  \begin{align*}
			\expect{(X - \expect{X})^2} &= \expect{X^2 - 2X\expect{X} + \expect{X}^2} \\
			&= \expect{X^2} - 2\expect{X} \expect{X} + \expect{X}^2 \\
			&= \expect{X^2} - \expect{X}^2
		  \end{align*}
	\item \(\Var{X} = \min_{c \in \mathbb R} \expect{(X - c)^2}\), and this minimum is achieved at \(c = \expect{X}\).
	      Indeed, if we let \(f(c) = \expect{(X - c)^2}\), then \(f(c) = \expect{X^2} - 2c\expect{X} + c^2\).
	      Minimising \(f\), we get \(f(\expect{X}) = \Var{X}\) as required.
\end{itemize}
As an example, consider \(X \sim \mathrm{Bin}(n, p)\).
Then \(\expect{X} = np\), as we found before.
Note that we can also represent this binomial distribution as the sum of \(n\) Bernoulli distributions of parameter \(p\) to get the same result.
The variance of \(X\) is
\[
	\Var{X} = \expect{X^2} - \expect{X}^2
\]
In fact, in order to compute \(\expect{X^2}\) it is easier to find \(\expect{X(X-1)}\).
\begin{align*}
	\expect{X(X-1)} & = \sum_{k=2}^n k \cdot (k-1) \cdot \binom{n}{k} \cdot p^k \cdot (1-p)^{n-k} \\
	                & = \sum_{k=2}^n \frac{k(k-1)n!
	p^k (1-p)^{n-k}}{(n-k)!k!}                                                                    \\
	                & = \sum_{k=2}^n \frac{n!
	p^k (1-p)^{n-k}}{((n-2)-(k-2))!(k-2)!}                                                        \\
	                & = n(n-1)p^2 \sum_{k=2}^n \binom{n-2}{k-2} p^{k-2} (1-p)^{n-k}               \\
	                & = n(n-1)p^2
\end{align*}
Hence,
\[
	\Var{X} = \expect{X(X-1)} + \expect{X} - \expect{X}^2 = n(n-1)p^2 + np - (np)^2 = np(1-p)
\]
As a second example, if \(X \sim \text{Poi}(\lambda)\), we have \(\expect{X} = \lambda\).
Because of the factorial term, it is easier to use \(X(X-1)\) than \(X^2\).
\begin{align*}
	\expect{X(X-1)} & = \sum_{k=2}^\infty k(k-1) e^{-\lambda} \frac{\lambda^k}{k!}                  \\
	                & = e^{-\lambda} \sum_{k=2}^\infty \frac{\lambda_{k-2}}{(k-2)!} \cdot \lambda^2 \\
	                & = \lambda^2
\end{align*}
Hence,
\[
	\Var{X} = \lambda^2 + \lambda - \lambda^2 = \lambda
\]

\subsection{Covariance}
\begin{definition}
	Let \(X\) and \(Y\) be random variables.
	Their covariance is defined
	\[
		\Cov{X,Y} = \expect{(X - \expect{X})(Y - \expect{Y})}
	\]
	It is a measure of how dependent \(X\) and \(Y\) are.
\end{definition}
Immediately we can deduce the following properties.
\begin{itemize}
	\item \(\Cov{X,Y} = \Cov{Y,X}\)
	\item \(\Cov{X,X} = \Var{X}\)
	\item \(\Cov{X,Y} = \expect{XY} - \expect{X}\cdot\expect{Y}\).
	      Indeed, \((X - \expect{X})(Y - \expect{Y}) = XY - X\expect{Y} - Y\expect{X} + \expect{X}\expect{Y}\) and the result follows.
	\item Let \(c \in \mathbb R\).
	      Then \(\Cov{cX,Y} = c\Cov{X,Y}\), and \(\Cov{c + X,Y} = \Cov{X,Y}\).
	\item \(\Var{X + Y} = \Var{X} + \Var{Y} + 2\Cov{X,Y}\).
	      Indeed, we have

	      \(\Var{X + Y} = \expect{(X - \expect{X} + Y - \expect{Y})^2}\) which gives

	      \(\expect{(X - \expect{X})^2} + \expect{(Y - \expect{Y})^2} + 2\expect{(X - \expect{X})(Y - \expect{Y})}\) as required.
	\item For all \(c \in \mathbb R\), \(\Cov{c, X} = 0\)
	\item If \(X\), \(Y\), \(Z\) are random variables, then \(\Cov{X + Y,Z} = \Cov{X,Z} + \Cov{Y,Z}\).
	      More generally, for \(c_1, \dots, c_n, d_1, \dots, d_m\) real numbers, and for \(X_1, \dots, X_n, Y_1, \dots, Y_m\) random variables, we have
	      \[
		      \Cov{\sum_{i=1}^n c_i X_i,\sum_{j=1}^m d_j Y_j} = \sum_{i=1}^n \sum_{j=1}^m c_i d_j \Cov{X_i,Y_j}
	      \]
	      In particular, if we apply this to \(X_i = Y_i\), and \(c_i = d_i = 1\), then we have
	      \[
		      \Var{\sum_{i=1}^n X_i} = \sum_{i=1}^n \Var{X_i} + \sum_{i \neq j} \Cov{X_i,X_j}
	      \]
\end{itemize}

\subsection{Expectation of functions of a random variable}
Recall that \(X\) and \(Y\) are independent if for all \(x\) and \(y\),
\[
	\prob{X = x, Y = y} = \prob{X = x} \cdot \prob{Y = y}
\]
We would like to prove that given positive functions \(f, g \colon \mathbb R \to \mathbb R_+\), if \(X\) and \(Y\) are independent we have
\[
	\expect{f(X)g(Y)} = \expect{f(X)} \cdot \expect{g(Y)}
\]
\begin{proof}
	\begin{align*}
		\expect{f(X)g(Y)} & = \sum_{(x, y)} f(x) g(y) \prob{X = x, Y = y}                 \\
		                  & = \sum_{(x, y)} f(x) g(y) \prob{X = x} \prob{Y = y}           \\
		                  & = \sum_{x} f(x) \prob{X = x} \cdot \sum_{y} g(y) \prob{Y = y} \\
		                  & = \expect{f(X)} \cdot \expect{g(Y)}
	\end{align*}
\end{proof}
The same result holds for general functions, provided the required expectations exist.

\subsection{Covariance of independent variables}
Suppose \(X\) and \(Y\) are independent.
Then
\[
	\Cov{X,Y} = 0
\]
This is because
\begin{align*}
	\Cov{X,Y} & = \expect{(X - \expect{X})(Y - \expect{Y})}             \\
	          & = \expect{X - \expect{X}} \cdot \expect{Y - \expect{Y}} \\
	          & = 0 \cdot 0                                             \\
	          & = 0
\end{align*}
In particular, we can deduce that
\[
	\Var{X + Y} = \Var{X} + \Var{Y}
\]
Note, however, that the covariance being equal to zero does not imply independence.
For instance, let \(X_1, X_2, X_3\) be independent Bernoulli random variables with parameter \(\frac{1}{2}\).
Let us now define \(Y_1 = 2X_1 - 1\), \(Y_2 = 2X_2 - 1\), and \(Z_1 = X_3 Y_1\), \(Z_2 = X_3 Y_2\).
Now, we have
\[
	\expect{Y_1} = \expect{Y_2} = \expect{Z_1} = \expect{Z_2} = 0
\]
We can find that
\[
	\Cov{Z_1, Z_2} = \expect{Z_1 \cdot Z_2} = \expect{X_3^2 Y_1 Y_2} = \expect{X_3^2} \cdot 0 \cdot 0 = 0
\]
However, \(Z_1\) and \(Z_2\) are in fact not independent.
Since \(Y_1, Y_2\) are never zero,
\[
	\prob{Z_1 = 0, Z_2 = 0} = \prob{X_3 = 0} = \frac{1}{2}
\]
But also
\[
	\prob{Z_1 = 0} = \prob{Z_2 = 0} = \prob{X_3 = 0} = \frac{1}{2} \implies \prob{Z_1 = 0} \cdot \prob{Z_2 = 0} = 0
\]
So the events are not independent.
